Familia: An Open-Source Toolkit for Industrial Topic Modeling
نویسندگان
چکیده
Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, including Latent Dirichlet Allocation (LDA), SentenceLDA and Topical Word Embedding (TWE). We further describe typical applications which are successfully powered by topic modeling, in order to ease the confusions and difficulties of software engineers during topic model selection and utilization.
منابع مشابه
OpenNMT: Open-Source Toolkit for Neural Machine Translation
We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as we...
متن کاملVisual Agent
Repast is a widely used, free, and open-source agent-based modeling and simulation toolkit. Three Repast platforms are currently available, each of which has the same core features but a different environment for these features. Repast Simphony (Repast S) extends the Repast portfolio by offering a new approach to simulation development and execution. This tutorial presents a simple “boids” styl...
متن کاملUnderstanding climate change tweets: an open source toolkit for social media analysis
Collective awareness about climate change is an ongoing problem because there is such a wealth of information available, which can be confusing, contradictory and difficult to interpret. In order to help citizens understand environmental concerns, and to help organisations better inform and target interested people with campaigns, we have developed an open source toolkit to analyse social media...
متن کاملVersion Control for Models: From Research to Industry and Back Again
Version control for models, including model diff & merge, is not only a crucial prerequisite for a wide-spread adoption of model-based engineering in industry, it also is and has been a popular and very active research topic since more than ten years. Several important algorithms and approaches emerged in the past to support the identification of differences among model versions, as well as to ...
متن کاملMachine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality
Topic models based on latent Dirichlet allocation and related methods are used in a range of user-focused tasks including document navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two tasks of automatic evaluation of single topics and automatic evaluation of whole topic models, and pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.09823 شماره
صفحات -
تاریخ انتشار 2017